{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "# Pyhon模块、包和程序\n",
    "\n",
    "本章学习如何写出实用的大型Python程序。\n",
    "\n",
    "## 之前的独立程序\n",
    "\n",
    "```python\n",
    "#!/usr/bin/env python3.6\n",
    "\n",
    "\"\"\"\n",
    "hello_world.py\n",
    "\"\"\"\n",
    "\n",
    "\n",
    "def main():\n",
    "    print('hello, world!')\n",
    "\n",
    "\n",
    "if __name__ = '__main__':\n",
    "    main()\n",
    "```\n",
    "\n",
    "## 命令行工具\n",
    "\n",
    "Python 作为一种脚本语言，可以非常方便地用于系统（尤其是*nix系统）命令行工具的开发。Python 自身也集成了一些标准库，专门用于处理命令行相关的问题。\n",
    "\n",
    "\n",
    "### 标准输入输出\n",
    "\n",
    "*nix 系统中，一切皆为文件，因此标准输入、输出可以完全可以看做是对文件的操作。标准化输入可以通过管道（pipe）或重定向（redirect）的方式传递："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "#!/usr/bin/env python3.6\n",
    "\n",
    "# reverse.py\n",
    "\n",
    "import sys\n",
    "\n",
    "for l in sys.stdin.readlines():\n",
    "    sys.stdout.write(l[::-1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "将上面的代码保存为`reverse.py`，通过管道`|`传递：\n",
    "\n",
    "```sh\n",
    "chmod +x reverse.py\n",
    "cat reverse.py | ./reverse.py\n",
    "```\n",
    "\n",
    "```\n",
    "6.3nohtyp vne/nib/rsu/!#\n",
    "\n",
    "yp.esrever #\n",
    "\n",
    "sys tropmi\n",
    "\n",
    ":)(senildaer.nidts.sys ni l rof\n",
    ")]1-::[l(etirw.tuodts.sys\n",
    "```\n",
    "\n",
    "通过重定向`<`传递：\n",
    "\n",
    "```sh\n",
    "./reverse.py < reverse.py\n",
    "# 输出结果同上\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "### 命令行参数\n",
    "\n",
    "一般在命令行后追加的参数可以通过`sys.argv`获取，`sys.argv`是一个列表，其中第一个元素为当前脚本的文件名："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "#!/usr/bin/env python3.6\n",
    "\n",
    "import sys\n",
    "\n",
    "print(sys.argv)  #下面返回的是Jupyter运行的结果"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "运行上面的脚本：\n",
    "\n",
    "```sh\n",
    "chmod +x argv.py\n",
    "./argv.py hello world\n",
    "# ['./argv.py', 'hello', 'world']\n",
    "python argv.py hello world\n",
    "# ['argv.py', 'hello', 'world']\n",
    "```\n",
    "\n",
    "对于比较复杂的命令行参数，例如通过`--option`传递的选项参数，如果是对`sys.argv`逐项进行解析会很麻烦，Python 提供标准库`argparse`（旧的库为`optparse`，已经停止维护）专门解析命令行参数。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "## 模块和包\n",
    "\n",
    "模块和包是大型项目的核心，如何组织包、将大型模块分解成多个文件对于合理组织代码结构非常重要。\n",
    "\n",
    "一个**模块**就是一个Python代码文件。把多个模块组织成文件层次，称之为**包**。\n",
    "\n",
    "### 导入模块\n",
    "\n",
    "使用`import`语句来进行模块导入，模块是不带`.py`拓展的另外一个Python文件的文件名。先来看下面的例子，这里先将代码组织成由很多分层模块构成的包。在文件系统上组织代码，并确保每一个目录都定义了一个`__init__.py`文件，这个文件可以是空的，其目的是要包含不同运行级别的包的可选的初始化代码。举个例子，如果你执行了语句`import graphics`，文件`graphics/__init__.py`将被导入，建立`graphics`命名空间的内容。像`import graphics.formats.jpg`这样导入，文件`graphics/__init__.py`和文件`graphics/graphics/formats/__init__.py`将在文件`graphics/formats/jpg.py`导入之前导入。\n",
    "\n",
    "```\n",
    "graphics/\n",
    "    __init__.py\n",
    "    primitive/\n",
    "        __init__.py\n",
    "        line.py\n",
    "        fill.py\n",
    "        text.py\n",
    "    formats/\n",
    "        __init__.py\n",
    "        png.py\n",
    "        jpg.py\n",
    "```\n",
    "\n",
    "之后，就可以使用各种`import`语句来导入模块。\n",
    "\n",
    "```python\n",
    "import graphics.primitive.line # 直接导入模块，须按照“模块名. ”的用法来使用\n",
    "from graphics.primitive import line # 导入模块的一部分\n",
    "import graphics.formats.jpg as jpg # 使用别名导入模块\n",
    "from graphics import *  # 导入模块的全部内容\n",
    "```\n",
    "\n",
    "### 在函数内部导入模块"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "def get_random():\n",
    "    import random\n",
    "    possibilities = ['a', 'b', 'c', 'd']\n",
    "    return random.choice(possibilities)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "大家都比较习惯的在函数外部导入："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "import random\n",
    "def get_random():\n",
    "    possibilities = ['a', 'b', 'c', 'd']\n",
    "    return random.choice(possibilities)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "如果被导入的代码被多个地方多次使用，就应该考虑在函数外部导入；如果被导入的代码只在函数内部使用，就在函数内部导入。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "### 模块搜索路径\n",
    "\n",
    "Python使用存储在`sys`模块下的目录名和zip压缩文件列表作为`path`变量，这个列表可以被读取和修改。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "\n",
    "for pth in sys.path:\n",
    "    print(pth)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "最开始的空白输出行是空字符串''，代表当前目录。如果空字符串是在`sys.path`的开始位置，Python会先搜索当前目录。Python会根据列表元素的位置优先导入前面目录中存在的模块，如果模块同名，后面路径中出现的模块则不会被导入。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "### 使用相对路径导入模块\n",
    "\n",
    "在包内，既可以使用绝对路径来导入也可以使用相对路径来导入。可以使用包的相对导入，使一个的模块导入同一个包的另一个模块。比如下面的例子，假设在文件系上有`mypackage`包，组织如下：\n",
    "\n",
    "```\n",
    "mypackage/\n",
    "    __init__.py\n",
    "    A/\n",
    "        __init__.py\n",
    "        spam.py\n",
    "        grok.py\n",
    "    B/\n",
    "        __init__.py\n",
    "        bar.py\n",
    "```\n",
    "\n",
    "如果模块`mypackage.A.spam`要导入同目录下的模块`grok`，它应该包括的`import`语句如下:\n",
    "\n",
    "```python\n",
    "# mypackage/A/spam.py\n",
    "\n",
    "from . import grok\n",
    "```\n",
    "\n",
    "如果模块`mypackage.A.spam`要导入不同目录下的模块`B.bar`，它应该使用的`import`语句如下:\n",
    "\n",
    "```python\n",
    "# mypackage/A/spam.py\n",
    "\n",
    "from ..B import bar\n",
    "```\n",
    "\n",
    "两个`import`语句都没包含顶层包名，而是使用了`spam.py`的相对路径。\n",
    "\n",
    "下面是同时使用绝对路径导入和相对路径导入的例子：\n",
    "\n",
    "```python\n",
    "# mypackage/A/spam.py\n",
    "\n",
    "from mypackage.A import grok # OK\n",
    "from . import grok # OK\n",
    "import grok # ImportError (not found)\n",
    "```\n",
    "\n",
    "像`mypackage.A`这样使用绝对路径名的不利之处是这将顶层包名硬编码到你的源码中。如果想重新组织它，你的代码将更加脆弱，很难工作。举个例子，如果改变了包名，你就必须检查所有文件来修正源码。同样，硬编码的名称会使移动代码变得困难。举个例子，也许有人想安装两个不同版本的软件包，只通过名称区分它们。如果使用相对导入，那一切都ok，然而使用绝对路径名很可能会出问题。\n",
    "\n",
    "`import`语句中的`.`和`..`语法跟shell中的当前目录和上级目录比较类似，把它们想象成指定目录名即可。`.`意味着在当前目录中查找，而`..B`表示在`../B`目录中查找。这种语法只能用在`from xx import yy`这样的导入语句中。\n",
    "\n",
    "```python\n",
    "from . import grok  # OK\n",
    "import .grok  # ERROR\n",
    "```\n",
    "\n",
    "相对导入不允许跳出定义包的那个目录，即利用句点的组合形式进入一个不是Python包的目录会出现导入错误。\n",
    "\n",
    "另外，相对导入只在特定的条件下才起作用，即，模块必须位于一个合适的包中才可以。特别的，位于脚本顶层目录的模块不能使用相对导入。\n",
    "\n",
    "此外，如果包的某个部分是直接以脚本的形式执行的，这种情况也不能使用相对导入。\n",
    "\n",
    "如：\n",
    "\n",
    "```sh\n",
    "$ python3 mypackage/A/spam.py  # relative imports fails\n",
    "```\n",
    "\n",
    "但是可以使用`-m`选项来执行上面的脚本：\n",
    "\n",
    "```sh\n",
    "$ python3 -m mypackage.A.spam  # relative imports work\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "## Python标准库\n",
    "\n",
    "Python具有庞大的标准库，Python的标准库和Python语言核心一起构成的Python语言。Python提供了标准库各模块的官方文档（ https://docs.python.org/3/library ）以及使用指南（ https://docs.python.org/3.6/tutorial/stdlib.html ）。另外， *Doug Hellmann* 的网站*Python 3 Module of the Week*（ https://pymotw.com/3/ ）和他的书 *The Python 3 Standard Library by Example*（中文版《Python 3标准库》由机械工业出版社于2018年10月11日出版）都是非常有帮助的指南，其针对标准库模块展示了大量的代码实例。\n",
    "\n",
    "教材中展示的一些常用的标准库模块：\n",
    "\n",
    "* `collections.defaultdict` 创建包含键默认值的字典，其参数是一个函数（可以是lambda函数），返回赋给缺失键的值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "from collections import defaultdict\n",
    "food_counter = defaultdict(int)\n",
    "for food in ['spam', 'spam', 'eggs', 'spam']:\n",
    "    food_counter[food] += 1\n",
    "\n",
    "for food, count in food_counter.items():\n",
    "    print(food, count)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "* `collections.Counter` 提供了计数器功能。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "from collections import Counter\n",
    "alpha_counter = Counter('aaabbbccc')\n",
    "alpha_counter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "函数`most_common()`以降序返回所有元素，可以给其指定一个数字参数，返回排名在该数字之前的元素。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "alpha_counter.most_common(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "另外，可以针对两个或多个计数器进行组合，其也支持类似集合元算的求交、并、差运算。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "alpha_counter2 = Counter('abcddd')\n",
    "alpha_counter + alpha_counter2  # Counter({'a': 4, 'b': 4, 'c': 4, 'd': 3})，计数组合\n",
    "alpha_counter - alpha_counter2  # Counter({'a': 2, 'b': 2, 'c': 2})，计数相减\n",
    "alpha_counter & alpha_counter2  # Counter({'a': 1, 'b': 1, 'c': 1})，两者交集中取两者中较小计数\n",
    "alpha_counter | alpha_counter2  # Counter({'a': 3, 'b': 3, 'c': 3, 'd': 3}), 两者并集中取两者中较大计数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "* `collections.OrderedDict` 有序字典，记忆字典键添加的顺序，然后从一个迭代器按照相同的顺序返回。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "from collections import OrderedDict\n",
    "quotes = OrderedDict([\n",
    "    ('Moe', 'A wise guy, huh?'),\n",
    "    ('Larry', 'Ow!'),\n",
    "    ('Curly', 'Nyuk nyuk!'),\n",
    "])\n",
    "for stooge in quotes:\n",
    "    print(stooge)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "quotes = dict([\n",
    "    ('Moe', 'A wise guy, huh?'),\n",
    "    ('Larry', 'Ow!'),\n",
    "    ('Curly', 'Nyuk nyuk!'),\n",
    "])\n",
    "for stooge in quotes:\n",
    "    print(stooge)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "* `collections.deque`双端队列，同时具有栈和队列的特征，可从序列的任何一端添加和删除项。函数`popleft()`去掉最左边的项并返回该项，`pop()`去掉最右边的项并返回该项。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "def palindrome(word):\n",
    "    \"\"\"检测回文\"\"\"\n",
    "    from collections import deque\n",
    "    dq = deque(word)\n",
    "    while len(dq) > 1:\n",
    "        if dq.popleft() != dq.pop():\n",
    "            return False\n",
    "    return True\n",
    "\n",
    "\n",
    "def another_palindrome(word):\n",
    "    \"\"\"检测回文\"\"\"\n",
    "    return word == word[::-1]\n",
    "print(palindrome('racecar'))\n",
    "print(another_palindrome('racecar'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "* `itertools` 迭代器函数，每次返回一项，并记住当前调用的状态。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "import itertools\n",
    "help(itertools)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "for item in itertools.chain([1, 2], ['a', 'b']):\n",
    "    print(item)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "for item in itertools.accumulate([1, 2, 3, 4], lambda x, y: x * y):\n",
    "    print(item)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "* `pprint` 友好打印"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "quotes = OrderedDict([\n",
    "    ('Moe', 'A wise guy, huh?'),\n",
    "    ('Larry', 'Ow!'),\n",
    "    ('Curly', 'Nyuk nyuk!'),\n",
    "])\n",
    "print(quotes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "from pprint import pprint\n",
    "pprint(quotes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "### 获取第三方Python代码\n",
    "\n",
    "* **Pypi** (https://pypi.python.org)\n",
    "* **Github** (https://github.com)\n",
    "* **ReadTheDocs** (https://readthedocs.org)\n",
    "* **activestate** (http://code.activestate.com/recipes/langs/python)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  },
  "name": "8_modules_pacakges_programs.ipynb"
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
